The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time.
translated by 谷歌翻译
Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens, error correction models should only modify incorrect words, and therefore detecting incorrect words is important for error correction. Previous works on error correction either implicitly detect error words through target-source attention or CTC (connectionist temporal classification) loss, or explicitly locate specific deletion/substitution/insertion errors. However, implicit error detection does not provide clear signal about which tokens are incorrect and explicit error detection suffers from low detection accuracy. In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection. Specifically, we first detect whether a token is correct or not through a probability produced by a dedicatedly designed language model, and then design a constrained CTC loss that only duplicates the detected incorrect tokens to let the decoder focus on the correction of error tokens. Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect and thus does not need to duplicate every token but only incorrect tokens; compared with explicit error detection, SoftCorrect does not detect specific deletion/substitution/insertion errors but just leaves it to CTC loss. Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively, outperforming previous works by a large margin, while still enjoying fast speed of parallel generation.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
DeepMind的游戏理论与多代理团队研究多学科学习的几个方面,从计算近似值到游戏理论中的基本概念,再到在富裕的空间环境中模拟社会困境,并在困难的团队协调任务中培训3-D类人动物。我们小组的一个签名目的是使用DeepMind在DeepMind中提供的资源和专业知识,以深入强化学习来探索复杂环境中的多代理系统,并使用这些基准来提高我们的理解。在这里,我们总结了我们团队的最新工作,并提出了一种分类法,我们认为这重点介绍了多代理研究中许多重要的开放挑战。
translated by 谷歌翻译
随着在高风险决策中引入机器学习,确保算法公平已成为越来越重要的问题。为此,已经提出了许多关于公平性的数学定义,并且已经开发了多种优化技术,所有这些都旨在最大化明确的公平概念。但是,公平解决方案取决于训练数据的质量,并且对噪声高度敏感。最近的研究表明,鲁棒性(模型在看不见的数据上表现良好的能力)在解决新问题时应使用的策略类型起着重要作用,因此,测量这些策略的鲁棒性已成为一种基本问题。因此,在这项工作中,我们提出了一个新标准,以衡量各种公平优化策略的鲁棒性 - \ textit {稳健性比率}。我们使用三种最受欢迎​​的公平策略在五个最受欢迎的公平定义方面,在五个基准标记公平数据集上进行了多次广泛的实验。我们的实验从经验上表明,依赖阈值优化的公平方法对所有评估的数据集中的噪声非常敏感,尽管大多数表现优于其他方法。这与其他两种方法相反,这对于低噪声方案而言不太公平,但对于高噪声方案而言更公平。据我们所知,我们是第一个定量评估公平优化策略的鲁棒性的人。这可以作为选择各种数据集的最合适的公平策略的指南。
translated by 谷歌翻译
本文提出了一种深入学习辅助合成方法,用于使用3D EM结构的RF / MM波被动匹配网络直接端到端生成。与从目标电路分量值和目标拓扑结构合成EM结构的现有方法不同,我们所提出的方法实现了从所需性能值的网络拓扑到输入的网络拓扑的直接合成。我们在片上1:1个变压器的阻抗匹配网络上展示所提出的合成神经网络(NN)模型。通过利用参数共享,综合NN模型成功提取了输入阻抗和负载电容器的相关特征,并在45nm的SOI进程中预测了变压器3D EM几何体,该过程将与标准50 $ \ Omega $负载匹配目标输入阻抗吸收两个装载电容器。作为概念验证,合成了几个示例变压器几何形状,并在ANSYS HFS中验证以提供所需的输入阻抗。
translated by 谷歌翻译
传统的深度传感器产生准确的真实世界深度估计,即使仅在仿真域训练的最先进的学习方法也会超越。由于在模拟域中容易获得地面真理深度,但在真实域中很难获得,因此我们提出了一种利用两个世界的最佳方法的方法。在本文中,我们展示了一个新的框架,ActiveZero,这是一个混合域学习解决方案,适用于不需要真实世界深度注释的活动立体宽度系统。首先,我们通过使用混合域学习策略来证明我们的方法对分发外数据的可转换性。在仿真域中,我们在形状原语数据集上使用监督差异丢失和自我监督损失的组合。相比之下,在真实域中,我们只在数据集中使用自我监督损失,这些损失是从培训仿真数据或测试真实数据的分发。其次,我们的方法介绍了一种名为Temporal IR的自我监督损失,以增加我们在难以感知地区的重新注入的鲁棒性和准确性。最后,我们展示了如何训练该方法的端到端,并且每个模块对于获得最终结果很重要。关于真实数据的广泛定性和定量评估表明了甚至可以击败商业深度传感器的最新状态。
translated by 谷歌翻译
由于筛选乳房X线照片的假阴性评估,通常在晚期检测到与其他癌症更差的间隔和大型侵入性乳腺癌。错过的筛选时间检测通常由其周围乳腺组织模糊的肿瘤引起的,这是一种称为掩蔽的现象。为了研究和基准爆发癌症的乳房Xmmpare掩蔽,在这项工作中,我们引入CSAW-M,最大的公共乳房数据集,从10,000多个人收集并用潜在的掩蔽注释。与以前的方法对比测量乳房图像密度作为代理的方法,我们的数据集直接提供了五个专家屏蔽潜在评估的注释。我们还培训了CSAW-M的深入学习模型来估计掩蔽水平,并显示估计的掩蔽更加预测筛查患有间隔和大型侵入性癌症的参与者 - 而不是明确培训这些任务 - 而不是其乳房密度同行。
translated by 谷歌翻译
Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are some issues when handling multiple source sentences. First, it is non-trivial to leverage the voting effect from multiple source sentences since they usually vary in length. Thus, we propose a novel alignment algorithm to maximize the degree of token alignment among multiple sentences in terms of token and pronunciation similarity. Second, the decoder can only take one adjusted source sentence as input, while there are multiple source sentences. Thus, we develop a candidate predictor to detect the most suitable candidate for the decoder. Experiments on our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce the WER over the previous correction model with single candidate by 3.2% and 2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR error correction. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline and can serve as a unified post-processing module for ASR.
translated by 谷歌翻译